Feasibility of cross-vendor linkage of ophthalmic images with electronic health record data: an analysis from the IRIS Registry®

Abstract Purpose To link compliant, universal Digital Imaging and Communications in Medicine (DICOM) ophthalmic imaging data at the individual patient level with the American Academy of Ophthalmology IRIS® Registry (Intelligent Research in Sight). Design A retrospective study using de-identified EHR registry data. Subjects, Participants, Controls IRIS Registry records. Materials and Methods DICOM files of several imaging modalities were acquired from two large retina ophthalmology practices. Metadata tags were extracted and harmonized to facilitate linkage to the IRIS Registry using a proprietary, heuristic patient-matching algorithm, adhering to HITRUST guidelines. Linked patients and images were assessed by image type and clinical diagnosis. Reasons for failed linkage were assessed by examining patients' records. Main Outcome Measures Success rate of linking clinicoimaging and EHR data at the patient level. Results A total of 2 287 839 DICOM files from 54 896 unique patients were available. Of these, 1 937 864 images from 46 196 unique patients were successfully linked to existing patients in the registry. After removing records with abnormal patient names and invalid birthdates, the success linkage rate was 93.3% for images. 88.2% of all patients at the participating practices were linked to at least one image. Conclusions and Relevance Using identifiers from DICOM metadata, we created an automated pipeline to connect longitudinal real-world clinical data comprehensively and accurately to various imaging modalities from multiple manufacturers at the patient and visit levels. The process has produced an enriched and multimodal IRIS Registry, bridging the gap between basic research and clinical care by enabling future applications in artificial intelligence algorithmic development requiring large linked clinicoimaging datasets.


Introduction
Ophthalmic diagnostic imaging is indispensable to modern clinical practice.Commonly used modalities include optical coherence tomography (OCT), anterior segment and fundus photography, fundus autofluorescence (FAF), fluorescein angiography (FA), and indocyanine green angiography (ICGA).Additionally, imaging is becoming increasingly important for eye screening examinations performed in nonophthalmology practices to identify patients at the earliest stages of potentially blinding diseases, such as diabetic retinopathy, and as adjuncts when in-person clinical evaluation is difficult or not possible. 1These instruments allow for more accurate diagnosis and monitoring of ophthalmic conditions and have revolutionized disease diagnosis and staging over the past several decades. 2 Modern clinical decision making is also reliant on multifaceted data sources such as these. 3,4here is an abundance of imaging datasets available in ophthalmology, 5 and electronic health record (EHR) registries are similarly rapidly growing in number.While there is significant scientific potential in combining these two disparate information sources, prior obstacles in these efforts have been substantial. 5For example, there is often little and variable compliance by different device manufactures with the global Digital Imaging and Communications in Medicine (DICOM) standards, resulting in difficulties in metadata harmonization. 6Additionally, an accurate and reliable methodology is needed for matching patients between two different data sources.Above all, any methodology needs to be conducted in a highly secure manner to protect private patient information.
The American Academy of Ophthalmology IRIS V R Registry (Intelligent Research in Sight) is the nation's first and world's largest comprehensive eye disease clinical database with over 70 million unique patients and over 12 000 contributing ophthalmologists and is continually growing.In this study, we retrieved and integrated patient-level DICOM file metadata from Heidelberg Engineering (Heidelberg, Germany) and Optos (Dunfermline, United Kingdom) devices and linked them with corresponding clinical and demographic data in the IRIS Registry.The purpose of this study was to assess the feasibility of creating an accurate methodology for successful linkage of clinical and imaging data through the matching of metadata from images to patient demographic data and to evaluate the performance of this linkage.Accurate linkage across different imaging modalities in adherence with the DICOM standard will support the use of these datasets for research and other secondary uses and has the potential to support development of artificial intelligence (AI) algorithms to help answer important questions about eye diseases and conditions.

Methods
This was a retrospective cohort study of patients included in the IRIS Registry, the world's largest ophthalmology registry.The IRIS Registry and its use for research purposes have been described in detail previously. 7Data stored within the IRIS Registry are deidentified and compliant with the Health Insurance Portability and Accountability Act.All data and analysis adhered to HITRUST8 and Mirador 9 guidelines for privacy and confidentiality.The Western Institutional Review Board Copernicus Group (WCG) reviewed and approved this project and determined that due to the deidentified, retrospective, registry-based nature of our project, written informed consent was waived for this study. 10All research adhered to the tenets of the Declaration of Helsinki.
Imaging data were acquired by two practices with Optos V R 200Tx, Optos V R P200DTx, and Heidelberg Spectralis V R and comprehensively normalized and standardized before linkage with EHR data.A format conversion was conducted first to transform manufacturer-specific native imaging data to the standard DICOM format with a licensed Heidelberg proprietary software (HEYEX 2).All converted DICOM files followed DICOM Supplement 91 and Supplement 110 standards. 11,12After all imaging data were standardized to the DICOM format, their metadata were extracted and harmonized to facilitate linking with EHR data.With different imaging data acquired from various modalities and machines, the first two levels of metadata were extracted and concatenated with the following categories of fields: patient, study, series, and image.With the available identifiers extracted from the metadata, a custom linkage algorithm was developed to link patients in the imaging database with patients in the IRIS Registry using the following identifiers: name, gender, medical record number (MRN), birthdate, and practice-related locations.Four sets of combinations were used to determine linkage as follows: location, name, birthdate, and MRN; location, name, birthdate, and gender; location, name, and MRN; and location, birthdate, and MRN.A pair of linkages was considered successful when any set of combinations was linked.Graph theory matching was also implemented to ensure efficient linking.In addition to imaging data, OCT-related key measurements in encapsulated exported PDF files were also obtained and stored in the DICOM format.For each key measurement DICOM file, a service object pair instance unique identifier was provided to link measurements to OCT scans, which had patient information and were linked to the IRIS Registry.
After successful linkage, de-identification of the imaging data was conducted in the same way as de-identification of the EHR data.DICOM metadata fields representing protected health information (PHI) were identified and masked in accordance with an expert-determined certification of the data model (Supplementary Figure S1). 10 The tag names corresponding to the masked PHI values were still preserved to ensure conformance to the DICOM standard and message integrity.The patient ID field was replaced by an anonymous identifier which was tokenized by a proprietary algorithm.
To explore the cause of a failed linkage, subsequent analysis was performed to assess the number of patients and images with the following criteria: patient with a number or a special character in their name; patient with an abnormal birthdate, defined as before January 1, 1900, or after June 1, 2020(last date of imaging acquisition); and patient with the phrase "test" as part of their name.Patients who fit multiple criteria were counted only once.
With the successfully linked clinicoimaging dataset, a series of descriptive statistics was generated by each image type and by specific disease diagnosis.Image types were defined by the metadata generated by device manufacturer, including OCT, fundus color photography, FAF, FA, ICGA, infrared reflectance (IR), and blue reflectance (BR).Disease diagnoses were defined by International Classification of Diseases (ICD) codes, including diabetic retinopathy with macular edema (DME), diabetic retinopathy without macular edema, exudative age-related macular degeneration (AMD), nonexudative AMD, geographic atrophy (GA), glaucoma, retinal vascular occlusions, choroidal disorders, and hereditary retinal dystrophy.Specific ICD codes used for each diagnosis are listed in Supplementary Table S1.To avoid confusion and complexity, diagnosis in the clinical structured data was linked to patients in the imaging dataset on the patient level, and then all imaging visits and DICOM images were counted for each patient.Patients with multiple diagnoses were allowed to be counted repeatedly among different diagnoses when applicable.Moreover, the relationship between time and the linkage success rate was analyzed.Patients were grouped by the year of their last imaging visit, and the linkage success rate was calculated for each group.

Results
In total, we acquired 2 287 839 DICOM images from 250 954 clinical visits of 54 896 patients.Of these, we linked 1 937 864 DICOM images from 221 079 clinical visits of 46 196 patients to the IRIS Registry.These images included OCT, fundus color photography, FAF, FA, ICGA, IR, and BR.The initial linkage success rate of EHR patient records to the IRIS Registry patient records was 84.2%.A total of 84.7% of all images were able to be linked to a patient record in the IRIS Registry.Table 1 describes the patient demographics of linked and unlinked records.Table 2 shows the number of patients, visits, and images as well as follow-up statistics by diagnosis, and the distribution of patients, visits, and images by image type and device manufacture.Most unique patients with images in the cohort had a diagnosis of glaucoma, followed by diabetic retinopathy without, and with, macular edema.Patients with non-exudative agerelated macular degeneration had the greatest number of unique clinic visits, followed by exudative age-related macular degeneration and glaucoma.There were comparable follow-up times available for all diagnoses, ranging from 1.5 to 2.6 years.With respect to image modalities, IR represented the most frequently used devices, accounting for 40 121 patients, 204 674 visits, and 515 650 unique images generated.With regard to images by vendor, Heidelberg was the most frequently used to obtain patient images in each category except for color fundus photos (Optos, 222 558).Of note, Heidelberg instruments included in the study were not designed or indicated for color fundus photography.Furthermore, we analyzed the linkage success rate of images related to time, namely the year of patients' last imaging visit in Table 3.Interestingly, there was an increase in linkage success rate when patient visits from earlier timepoints were compared to those that occurred later (2014-2015: 66.53% to 2020-2021: 94.96%).Table 3 also shows the success rate of linkage by imaging type, with linkage success varying from 79% to 91% depending on the type of imaging.Supplementary Table S2 describes the same information in the unlinked records.
In the unlinked cohort of 8700 patients, an exploration analysis was done to explain the failed linkage.Table 4 lists the number of unlinked images and patients due to abnormal patient name usage and abnormal patient birthdate usage.Abnormal patient name usage includes names with numbers (excluding Roman numerals) and special characters, as well as test patients.Both the patient count and image count were mutually exclusive among rows.After removing these records, our successful linkage rate improved to 93.3% for images and 88.2% for patients, as illustrated in Figure 1.

Discussion
Here we describe a process for leveraging imaging metadata standards to augment an EHR registry with images in a secure and scalable way.Using identifiers from DICOM metadata, we created an automated pipeline to connect longitudinal realworld clinical data comprehensively and accurately to various image modalities from two device manufacturers at the patient and visit levels.Initially, we were able to match 46 196 patients who had a record linked to the IRIS Registry with a success rate of 84.2%.This resulted in 1 937 864 DICOM images matched with a success rate of 84.7%.When imaging metadata with abnormal names or invalid birthdates were excluded from analysis, our overall success rate improved to 88.2% of IRIS Registry-linked patients and 93.3% of all images.The most common reasons for exclusion were patient records that likely represented "test patients" and those with implausible birthdates (before 1900).This suggests that similar methodology may need to be employed in the future to create other large linked clinicoimaging datasets because this same issue is likely to arise.
After successful patient-level linkage, all DICOM images from linked patients were considered linked as well, though it is possible some imaging visits might not have a corresponding clinical visit in the IRIS Registry.This is largely due to the timeframe differences between the IRIS Registry and our

DICOM imaging dataset. The IRIS Registry was launched in
March 2014, 7 while some of our images available date back to 2010.An alternative approach would be further linking of imaging visits with clinical visits in the IRIS Registry after successful patient linkage.We did not choose this approach in order to maximize the number of images linked.Because many of the common ocular diseases are age-related diseases and chronic in nature, patients with a diagnosis (ie, AMD) in 2014 would also have a high likelihood of having the same disease from an image obtained on a date prior to the start of the IRIS Registry.Moreover, in ophthalmology, images are commonly used as evidence for diagnosis.Therefore, in the absence of clinical information, clinical diagnosis could also be inferred from images manually or using machine learning algorithms. 13,14In Table 2, the diagnosis was linked on the patient level rather than on the visit level for the same reasons as noted above.Multi-source data linkage is an important topic in healthcare because it provides several advantages, such as understanding complete patient journeys, cross-validation of diagnoses and procedures, and addressing knowledge gaps in patient care.In ophthalmology, imaging data are often considered the ground truth for diagnosis, and the successful linkage between imaging data and EHR data can work synergistically to create a cross-validated dataset. 15,16With the increased popularity of image-based machine learning algorithms for disease detection, imaging data could also be used to enrich IRIS Registry EHR data, particularly for certain diseases that do not have well-defined ICD codes for diagnosis.This study included imaging data from two common ophthalmic imaging vendors, due to availability of this equipment at research sites.The success of our algorithm suggests the methodology will be scalable to other ophthalmic imaging vendors as well and will be a topic of future study.
There were some limitations to our study.Our algorithm was able to link between 84.2% and 88.2% of patient records, depending on the methodology used.Since only the four variables present in the DICOM metadata were used to match patient records, it is possible that clinical records may have been inaccurately attributed to imaging metadata due to algorithmic error.Additionally, many images had incomplete information within their metadata (ie, missing names, birthdates, or MRNs).Based on our algorithmic approach that relies on this information, matching would not be possible, making linkages difficult to perform using this methodology.
The ability for EHRs to directly transfer demographic information to ophthalmic imaging equipment is a feature of those systems with the DICOM Modality Worklist feature.While this feature is present with some DICOM-compliant EHR and imaging systems, clinics that still use software versions prior to the implementation do not have this capability.Additionally, many clinics do not have imaging equipment connected directly to EHR systems, making this workflow not possible in these instances.As such, methodology such as ours described may still be necessary to successfully link imaging data to patient records.As indicated in Table 3, the linkage rate increases with time, possibly due to increased compliance to such features and standards.At this time, the IRIS Registry does not contain information on whether EHRs or imaging equipment meets specific DICOM compliance standards, but this will be a topic of future studies.
In summary, we created an accurate and scalable solution for the creation of a linked clinicoimaging dataset from the largest clinical ophthalmology registry, as well as various imaging modalities from two different ophthalmic imaging vendors.This curation process will serve as the basis for an enriched and multimodal IRIS Registry, enabling enhanced research and advanced analytics.As imaging dependence and digital data capture grow, compliance to standards such as DICOM will be critical to advancing data-driven clinical insights to improve quality of care and enable novel ophthalmic drug and device development approaches.Data harmonization between different ophthalmic device manufacturers and modalities in the past has been challenging due to low compliance rates with the DICOM standards, and this remains true today.This presents hurdles to patient image linkage to clinical registries such as the IRIS Registry.Of note, the use of DICOM standards has also been crucial in building similar algorithms in other specialties, such as radiology. 17The ability to employ accurate linkage methodology across different imaging instruments in adherence with DICOM standards will support scalable processes, and standards compliance will be important for work such as this in the future.

For
example, with a linked clinicoimaging dataset, it would be possible to develop a GA diagnosis algorithm based on images because the GA-specific ICD-10 codes were not introduced to clinical usage until 2016.Clinical datasets are also necessary to support the development of AI algorithms because there are areas where the addition of images only may not be useful.Thus far, patient outcome research with the IRIS Registry has largely used clinical outcomes like visual acuity and intraocular pressure.With our linked clinicoimaging dataset, it would be possible to include image-derived outcomes, such as intraretinal fluid volume as well as subretinal fluid volume for DME and exudative AMD patients undergoing anti-vascular endothelial growth factor treatments, as well as GA lesion size for late-stage non-exudative AMD patients to monitor disease progression.
Note: Unlinked patients lack ethnicity and race information as this information was not recorded in the clinical structured data.Abbreviations: NA ¼ not applicable, SD ¼ standard deviation.JAMIA Open, 2024, Vol. 7, No. 1

Table 2 .
Number of patients, visits, images, and follow-up statistics by diagnosis, and distribution of patients, visits, and images by image type and device manufacture.

Table 3 .
Linkage success rate grouped by the year of patients' last imaging visit and imaging type.

Table 4 .
Failed linkage due to abnormal patient information.